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Abstract 

A popular model selection approach for generalized linear mixed-effects 
models is the Akaike information criterion, or AIC. Among others, 
pointed out the distinction between the marginal and conditional inference 
depending on the focus of research. The conditional AIC was derived for 
the linear mixed-effects model which was later generalized by [j. We 
show that the similar strategy extends to Poisson regression with random 
effects, where condition AIC can be obtained based on our observations. 
Simulation studies demonstrate the usage of the criterion. 

1 Introduction 

Generalized linear models (GLM) are powerful modelling tools that have gained 
popularity in statistics. It has wide applications in medical studies, pattern clas- 
sification, sample surveys, etc. The scope of GLM can be greatly expanded by 
the incorporation of random effects. For example, in typical longitudinal stud- 
ies, a model with random effects not only models individual characteristics, but 
attempts to extrapolate to the entire population as well. It takes into account 
both within cluster and between cluster variations in the study. Model selection 
in GLM is typically achieved using AIC or BIC combined with step-wise proce- 
dures. With fixed-effects models, the definition of AIC is straightforward using 
the likelihood penalized by a term that depends on the number of parameters. 
When random effects come into play, it is not entirely clear how the number of 
parameters in the model should be defined. Based on other previous works such 
as [ij and 0], Q made distinction between marginal and conditional inference 
and provided a formal definition of conditional Akaike information, cAI, which 
gives a theoretical justification for some previous approaches. They derived an 



unbiased estimator of cAI, called conditional AIC or cAIC, when the covariance 
matrix of random effects is known. [3| derives a more general cAIC that dis- 
penses with such strong assumptions. In the definition of cAI was given for 
general mixed-effects models but the unbiased estimator was only derived for 
linear mixed-effects models. A general approach of getting an unbiased estima- 
tor of cAI for generalized linear mixed-effects models (GLMM) seems to be out 
of reach. In this note, we propose an unbiased estimator of cAI for Poisson re- 
gression with random effects. The nature of Poisson regression is very different 
from the linear model since the responses are discrete. However, it turns out 
unbiased cAIC exists although it is derived in a different way. 

2 Conditional AIC for count data 

Suppose we have some count responses {yi},i — 1, . . . ,m from m clusters that 
we want to model in relation to covariates Xi and Zi, with yi an x 1 vector 
from cluster i, and Xi,Zi are UiXp and riiX q matrices associated with fixed and 
random effects respectively. We use Poisson GLMM with the canonical link: 

yt - Pois{^ii) (1) 
logfi, = + Z,h: h - iV(0, G), 

where /? is a p x 1 vector of fixed effects and hi \s a. q x 1 vector of random 
effects following a mean zero Gaussian distribution with unknown covariance 
matrix G. The total number of observations is thus N = X^I^li '^i- Let 9 be 
the population parameters in the model, including /3 and the parameters in G. 
The marginal likelihood is g{y\0) = / g{y\h,6)g{h\G)dh where g{y\b,d) is the 
Poisson likelihood conditional on the random effects and g{b\G) is the density 
of the random effects. Sometimes it is more convenient to represent ((T]) in the 
condensed form 

y Pois{i^) 
log/i = Xfi + Zb, 

where y = {yf, . . . , y^)'^ is an x 1 vector of count responses, X — {X'f , . . . , X^),Z = 
diag(Z'i, . . . , Z,n) and b = {bf , 6^)^. 

In marginal inference, the focus is on the population parameters and the 
random effects are just a mechanism for modelling the correlations within the 
clusters. The standard AIC being used refers to this case and is called marginal 
AIC, mAIC, by 0, defined hy -2\ogg{y\e{y))+2K , where K is the dimension of 
9. This penalty term is there to correct the bias caused by using the same data to 
estimate 6 as well as to evaluate the marginal likelihood g{y\6). The AIC is de- 
signed to approximate the Akaike information, AI = —2Ef(^y-^Ef(^y^-j logg{y*\6), 
where y* is an independent replicate oiy coming from the same true distribution 
f{y), which might not be contained within the family defined by ([T]). 

In conditional inference, the focus is on the cluster and the estimation of 
the random effects is of interest. The prediction in this case refers to new 



2 



responses with the same clusters. Suppose the true distribution of y is /(y, u) = 
f {y\u)p(u) where u is the true random effects with density p{u). Following 
the conditional AI is naturally defined as 

cAI= -2S^(y,„)S^(j^.|„) log g{y*\e{y),b{y)), 

where y* is independent of y, generated from the same conditional distribu- 
tion f{-\u). Similar to AI, cAI is cannot be directly calculated since the true 
distribution / is unknown. For linear mixed-effects models, unbiased estima- 
tors were derived in Q and No unbiased estimator has been proposed for 
other GLMM to our best knowledge. The following theorem gives an unbiased 
estimator of cAI for Poisson regression, and the proof is given in the Appendix. 

Theorem 1. Assume that the count responses have the true distribution y ~ 
Pois(fio), where /iq = (moIi • • ■ i Mow} the mean of the Poisson distribution 
and depends on some covariates as well as the random effects u. The data 
are modelled by {Ip with conditional likelihood denoted by g{y\6,b). For any 
estimator 6{y) and b{y), an unbiased estimator of the cAI is 

cAIC^ ^2loggiy\9,b) + 2K, 

where K is given by 

N 

XI log[yj(y)] - y^ log[y,(y^'"^)]} 

i=l 

and is the same as y except its i~~th component is replaced by yi — 1, and 

?/i log[yi(?/*^^'^^-')] = when yi — by convention. 

Remark 1. Although the derivation of the unbiased estimator for cAI is 
different from the linear model, with the latter derived by integration by parts 
the results have some resemblance with each other. For linear models, 
K is given by ^idyi/dyi and the partial derivatives are estimated by finite 
difference. Our K for Poisson regression bares the similarity in that it depends 
on the difference between iji and yi{y^y^~^^) , the fitted responses after perturbing 
the original observations. 

Remark 2. In Theorem 1, we only need to assume that the true model is 
in the Poisson family with means depending on some random effects u, which 
might also be different from the modelled random effects b. Thus the true model 
does not have to be included in the candidate model family. Besides, we are not 
assuming anything about the estimators and b and they can be any reasonable 
estimators used in the literature. 

3 Simulation study 

We conducted a simulation study to investigate the properties of our unbiased 
cAIC estimator and demonstrate the difference between marginal and condi- 
tional inference. We simulate data from model ([T]) with a random intercept: 

\ogpLtj = /3o + fiiX;j + i = 1, . . . , m = 10, j = 0, . . . , rij. 
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Table 1: Comparison of bias correction BC with its unbiased estimate, K, based 
on 500 sets of simulated data 



rii 


0-6 


BC 


K 


5 


0.25 


6.87 


6.53 


15 


0.25 


10.35 


10.43 


5 


0.5 


9.18 


9.09 


15 


0.5 


11.69 


11.45 


5 


1 


10.19 


10.31 


15 


1 


11.59 


11.14 



whoro /3o = 1. Pi = 0.2, xj = j and bi ^ N{0, crj^). In our simulation, we consider 
ni = 5 and rij = 15 with ai, = 0.25, 0.5 and 1. For each of the six specifications, 
500 data sets are generated. We compare the cAIC with the true bias, BC, 
defined by 

= i^/(3/,„) logg{y\e, b) - Ef(^y^^)Ef(y,\u) log5(y*l^, i), 

which is estimated by simulation with another independent 500 sets of t/*'s 
generated from the true conditional distribution f{-\u) that shares common 
random effects u with current responses. 

The results are shown in Table 1. The estimated biases are close to the true 
value. In general, the 'effective number of parameters' increases with the vari- 
ance for the random effects. The same comparison can be made for mAIC and 
also for fixed-effects models, but we found in our simulations that the estimator 
K in those cases is very close to the number of population parameters and there 
appears to be no advantages of using our estimator which only increases the 
computational burden. 

To illustrate the differences between marginal and conditional inference, we 
use the same setup as before with (/3o, = (1, 0.2), rij = 5, (T(, = 0.125, 0.25 and 
0.5. Laplace approximation is used to approximate the marginal likelihood in 
the calculation of mAIC, for which the bias is simply estimated by the number 
of population parameters, 3 in this case. Also, a fixed-effects model log/iy = 
/3o + (iiXj is fitted to the data and standard AIC is found. The values of AIC, 
mAIC and cAIC are shown in Figure 1 for different random effect variances. 
These values are averages over 500 sets of data simulated from the model. The 
difference between mAIC and cAIC is most obvious when ab = 0.125. In fact, by 
comparing the information criteria, when ab = 0.125, the fixed-effects model is 
selected for 395 of the 500 data sets when comparing AIC with mAIC, while it is 
selected only for 3 data sets when comparing AIC with cAIC. When ab — 0.25, 
fixed-effects model is selected for 165 of the 500 data sets using mAIC, while it 
is selected only once when using cAIC. 
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4 Concluding remarks 

Previous study of conditional AIC is only limited to the linear mixed-effects 
models. We provided the corresponding cAIC for Poisson regression. Since 
the derivation of the estimator does not depend on either the normality of ran- 
dom effects or specific estimators used for the fixed and random effects, the same 
formula works in more general contexts such as when using the approach of hier- 
archical likelihood [sjl which has become very popular in recent years. Although 
a general methodology seems to be lacking for generalized linear mixed-effects 
models, we believe that some approximation is possible and investigations in 
this direction are underway. 



Appendix 



Proof of Theorem 1. Suppose that the true conditional likelihood is f{y\u) = 

Y[i=i^~'^°'l^oi/yi^-^ where = (/^oi, • ■ ■ , Moat) depends on the random effects 
u. Let y be the fitted responses from the mixed-effects model. The conditional 
Akaike information is 

cAI = -2Efi^y^^)Efi^y,\^)\ogg{y*\e,b) 



-'2-Ef{y,u)Ef(^y\u) 



-2E 



i 

-yi + fioi \ogyi - Ef(^y*\u) logy*! 
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Meanwhile, 



-2Ef^y^^)logg{y\0,b) = -2E 



f{v,u) 



Thus 



cAI - {-2Ef(^y^^) ^og g{y\9, b)) = 2£;/(y,„) ^(yi " log^i 



In addition, we have that 



Ui+i Vi=0 



E E ^ ^iiogl^ily 



oo 

E E "I ^il0g[yi(2/ 



n 



(2^-l)^ll^__M0i J-j- ^ ^Oj 



where the vector y whose i— th component has been replaced hy Zi — 1, 

and similarly for j/*-^'"^'. 
Therefore, 



N 



cAI - (-2£;/(j^_„) log5(y|^,6)) = 2£;p(„)£;/(j^|„) |E(^^ ~ /«oi)logyi | 

= 2%(^,„) I^J2yJogyi-yilog[yi{y^y^-'^)]j 
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